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ABSTRACT 


A  condition  is  given  for  a  context-free  gramriPr  to  be  unnjnbiguous , 
This  condition  is  proved  to  be  both  necessary  end  sufficient,  A 
class  of  context-free  grammers  called  first-character-recognition 
grammars  (or  fcr  grammars)  is  defined.  These  grammars  obviously 
satisfy  the  necessary  and  sufficient  condition;  consequently,  they 
ere  unambiguous.  It  is  shown  to  be  a  decidable  question,  whether 
a  given  grammar  is  an  fcr  grammar.  Many  programming  languages  can 
be  described  by  fcr  greramars;  ALGOL  can  be  so  described,  except  for 
the  distinction  between  arithmetic  and  Boolean  expressions. 


vn 
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IMTRODUCTION 


Tl.is  paper  employs  a  formalism  that  has  become  standard  for  context-free 
languages,  Ginsburg  Ll]  has  reported  the  results  of  a  number  of  studies  in 
this  area.  To  provide  background  information,  some  results  of  those  studies 
are  summarized  in  the  following  paragraphs. 

Writing  two  words  together  indicates  concatenation;  wr’iting  two  sets  of  words 
together  indicates  complex  product.  For  an  arbitrary  set  F,  E*  is  the  set  of 
all  words  over  E,  i.e,,  all  the  finite  sequences  of  elements  from  E.  In  par¬ 
ticular,  E*  contains  the  empty  word  6.  If  w  is  a  word,  then  !’w|  is  the  number 
of  elements  in  w.  If  xyz  =  u,  then  y  is  said  to  be  a  subword  of  u,  and  x  is 
said  to  be  an  initial  subword  of  u,  and  z  is  said  to  be  a  terminal  subword  of 
u,  A  subword  of  u  that  is  distinct  from  u  is  called  a  proper  subw'ord. 

Definition,  A  grammar  G  is  a  4-tuple  (V,  2,  P,  v;here  V  is  a  finite  set 
("vocabulary''),  ^  is  a  subset  of  V  ("Letters"),  o  is  an  element  of  V-^,  and 
P  (the  sfc'w  of  "production")  is  a  finite  set  of  ordered  pairs  of  the  form 
with  '  in  V-^  and  w  in  V*, 

Definition,  By  a  node  from  v  in  V  is  meant  a  sequence  of  positive  integers 

(1,  i^,  i^y  i^)  with  0  ^  k  such  that  if  1  ^  k  there  is  a  sequence  of 

productions  v  "*  w  v  -•  w  ,  v  “*  w  .,,,  v  -•  w  such  that^  (l)  v  is  v, 

(2)  V.  is  the  i  -th  term  of  w  for  J  <  k,  (3)  if  w  =  6,  i  -  1  nnd  (4)  if 
J  ^  -L  J  J  K  K 

“k  ^  \  i“ki' 
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When  1  ^  k  and  we  call  the  term  of  the  label  of  the  node 

(1,  ig^  ...,  i^).  When  1  ^  k  and  w^  =  €,  we  call  €  the  label  of  the 

node.  The  letter  v  la  the  label  of  the  node  (l). 


The  node  (l,  ig,  i^)  is  called  terminal  if  or  the  i^-th 

term  of  v,  is  in  The  seouence  (l)  is  considered  a  terminal  node  from  all 
k 

V  in 


Definition.  The  statement  that  T  is  a  generation  tree  from  v  in  V  means  that 
T  is  a  collection  of  nodes  from  v  such  that:  (l)  the  sequence  (l)  is  in  T;  (2)  if 
(1,  i., ,  ig,  i^^  is  a  non-terminal  node  of  T,  then  there  is  one  and 

only  one  i  such  that  (l,  i^,  ig,  i^,  ...,  i^,  j)  is  in  T:  and  (j)  if  (l,  i^,  ig, 
1^,  ...,  1^)  is  in  T  then  (l,  i^,  ig,  i^,  ...,  i^  is  also  in  T.  The 
sequence  (l)  is  called  the  root  of  the  tree  and  its  label  is  v.  From  this 
definition  it  follows  that  the  generation  tree  for  v  in  ^  contains  only  the 
node  (1). 


(1,  2,  2,  1) 


2,  3) 


3) 


Generation  Tree  for  the  Word  aabb  in  a  Grammar 
Whose  Productions  are  a  a  cr  b  and  ot  (z. 
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Definition.  By  the  length  of  the  generation  tree  T  is  meant  one  less  than  the 
maximiun  length  of  the  sequences  of  integers  in  T.  The  length  of  the  generation 
tree  in  the  example  is  3- 

Notation.  If  each  of  (l,  i^,  i^,  iy  i^^)  and  U, 

a  node  from  v  then  write  (l,  i., ,  ....  i  )  <  (l,  J. ,  ....  J  )  to  mean  that 

'1'  '  m  1  n 

either  (l)  m  <  n  and  i^  =  for  1  ^  k  ^  m  or  (2)  if  k  is  the  least  integer  such 
that  i^  /  then  i^  <  J^. 

Definition.  Let  all  the  termina]  nodes  of  the  generation  tree  T  be  arranged  in 
a  sequence  N^,  ...,  such  that  N3_  <  ^  ^  i  ^  k.  Let  be  the  label 

of  N.,  for  1  ^  i  ^  k.  The  word  B.,  .  . .  B,  is  called  the  sentence  of  T  and  is 
denoted  by  S(T) . 

Def init j..:in .  For  v  in  V,  L(v)  =  I  xl  2  a  generation  tree  from  v  whose  sentence 
is  x}.  L(v)  is  called  the  language  of  v.  For  in  V,  let  L  (v^  v^)  = 

L  (v,)  L  (vg).  The  function  L  is  now  defined  for  all  v  in  V  U  V^.  The  language 
of  a  grammar  is  usually  defined  in  terms  of  a  sequence  of  steps  in  which  members 
of  V*  are  rewritten,  see  Bar-Hillel  L2].  It  should  be  clear  to  the  reader  that 
these  two  definitions  are  equivalent. 

Definition.  The  grammar  G  =  (V,  2]^  a)  is  said  to  be  ambiguous  at  v  in  V  if 
there  are  two  distinct  generation  trees  from  v  which  produce  the  same  word. 

The  entire  grammar  is  said  to  be  ambiguous,  if  it  is  aniiiguous  at  o . 
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Definition.  A  grammar  is  said  to  be  binary  if  each  production  is  either  of 
the  form  v  -  ^  or  v  -*  with  in  V. 

THE  NECESSARY  AND  SUFFICIENT  CONDITION 

Definition.  The  set  of  terminal  words  that  be  associated  with  x  in  V  U 

is  defined  by  H(x)  =  [  zj  3  u,  v  both  in  L(x)  such  that  u  z  =  v] . 

Definition.  The  set  of  initial  words  that  will  be  associated  with  x  in  V  U 

is  defined  by  K(x)  =  1  z]  3  u,  v  both  in  L(v)  such  that  z  u  =  v}. 

Theorem.  It  is  a  necessary  and  sufficient  condition  for  a  binary  grammar  G  = 

(V,  P,  ci)  to  be  unambiguous  at  all  v  in  V  that:  0-)  ^  2  ^  ^1  ^2 

are  two  distinct  productions  of  G  then  L(x^  x^)  L  (y^  y^)  =  0  and  (2)  if  §  -* 

X  y  is  a  production  of  G  then  H(x)  H  K(y)  =  0. 

Proof.  Since  the  condition  is  obviously  necessary,  we  proceed  to  prove  that  it 
is  sufficient.  The  grammar  G  will  now  be  shown  unambiguous  b;y  induction  on  the 
lengths  of  the  generation  trees.  Assume  that  for  all  §  in  V  there  do  not  exist 
two  distinct  generation  trees  of  length  ^  n  from  5  which  produce  the  same  word 
in  Assume  furthermore  that  there  does  exist  a  v  in  V  and  a  pair  of  genera¬ 

tion  trees  X  and  Y  from  v  with  length  ^  n+1  which  produce  the  same  word  ir. 

Case  1.  The  tree  X  begins  with  the  production  v  "*  x  and  the  tree  Y  begins 
with  a  different  production  v  -*  y,  where  x  and  y  are  in  +6.  Then  S(X)  = 

S(Y)  is  in  L(x)  ^  L(y),  which  is  contrary  to  the  hypothesis. 
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Case  2.  Both  trees  begin  with  the  same  production  v  -  u^.  Just  below  the 

root  of  X  are  two  subtrees  and  X^  from  u^  and  u^  respectively.  Let  and 
tie  similar  subtr^’es  of  Y. 

Case  2.1  S(X^)  -  S(Y^).  Then  S{X^)  =  S(Y2).  Since  the  subtrees  X^  and  Y^ 

have  length  ^  n,  they  cannot  be  involved  in  an  ambig-.:ity  from  u^,  therefore 
X^  =  Y^.  Similarly,  X^  =  Y^  and  thus  X  =  Y  which  contradicts  the  contrary 
assumption . 

Case  2.2  S(X^)  is  a  proper  initial  subword  of  S(Y^).  Let  z  be  a  word  such 

that  S(X^)  z  =  S(Y^). 

S(X^)  S(X2) 

- I - 1 - 

I  I 

I  I 

I  I 

1  ^  1 

S(Y^)  S(Y2) 

Since  z  is  in  H(u^)  and  K(u2),  the  hypothesis  of  the  theorem  is  contradicted. 

Case  2.3  S(Y^)  is  a  proper  initial  subword  of  S(X^).  Similar  to  case  2.2. 

O.E.D. 

FIRST -CHARACTER -RECOGNITION  GRAMMARS 

Definition,  Let  F(x)  denote  the  set  of  all  first  letters  of  words  in  L(x)  and 
let  E(x)  denote  the  set  of  all  first  letters  of  words  in  h(x).  Let  Q(x)  be  the 
predicate  is  in  L(x)." 
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Definition.  The  statement  that  G  =  (V,  P,  is  a  first -character -recognition 
grammar  (or  an  fcr  grammar),  G  is  a  binary  grammar  and  means  that  (l)  if  5  - 

and  y2  distinct  productions  of  G,  then  F  (x^  x^)  ^  F  (y^  y^) 

=  0  and  either  ~Q  (x^  x^)  or  ~Q  (y^  y^)  and  (2)  if  ^  x  y  is  a  production  of 
G,  then  E(x)  H  F(y)  =  0. 

Theorem.  Every  fcr  grammar  is  unambiguous. 


This  theorem  obviously  follows  from  the  necessary  and  sufficient  condition. 

The  computability  of  the  predicate  Q(x)  follows  from  the  theorem,  that  it  is 
decidable  v/hether  any  word,  including  €,  belongs  to  a  language.  For  a  proof  of 
this  theorem  see  Chomsky  or  Bar-Hillel  [2].  To  prove  that  F(x)  is  computable 
we  can  make  use  of  the  theorem  of  Gi  isburg  [3]  that  states  that  the  image  of  a 
context-free  language  under  a  finite  state  transducer  is  itself  a  context-free 
language.  The  transducer  needed  is  a  simple  one  that  outputs  the  first  letter 
of  each  v/ord  given  it.  From  the  given  grammar,  the  transdu 'er  theorem  gives  us 
a  new  grammar  whose  language  is  Ff'x).  We  use  the  decidability  algorithm 
mentioned  above  to  determine  v/hich  letters  are  v/ords  of  the  new  grammar. 

At  this  point,  more  direct  methods  computing  Q(x)  and  F(x)  will  be  given. 

These  methods  employ  ascending  sequences,  similar  to  those  used  by  Bar-Hillel  L2]. 


d 

Vp  in 

V  ; 

Ql  (•■ 

)  =  V 

-  C 

(v^  = 

n 

n'  1 

n 

V  ) 

1  2^ 

-  Q., 

<1, 


V  ^  u 


1  ''2  ^n  ^^1^  %  ^^2^' 
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It  can  be  easily  proven  that  there  exists  k  such  that  for  p  in  V  J 

(p)  Q2  (P)  ^  (P)  \  (p)  =  (p)  -  ...  ai.d  (p)  -  H  (l  ) 

For  V,  v^,  in  V-^  and  1  ^  n 

(v)  =  [  X  !  X  in  ^  and  either  v=xor3y  (v-x  y)) 

F  ,  (v)  =  F  (v)  o  i  X  3  u- ,  u_  (v  -  lu  u_  and  either  x  in  F  (u^ )  or  Q  (u, ) 

n+l'^  n'  1'2'  12  nl  a  1' 

and  X  in  F  (u^)  ] 
n  '  2  ^ 

^n  ^"^1  ^^2^  "  ^n  ^"^1^  U  [  xl  (v^)  and  x  in  (v^)  ] 

It  can  be  easily  proven  that  there  exists  k  such  that  for  p  in  V  U 

^1  ^2  ^  ^3  ^  ^k  "  ^k+1  "  •••  ^k  ^ 

The  function  E(p)  is  not  so  easy  to  handle.  As  before,  a  chain  will  be 
defined  that  converges  to  a  function  (p).  Suppose  we  make  a  new  condition 
by  replacing  E  (p)  with  (p)  in  the  sufficient  condition  already  given. 

These  two  conditions  will  be  proven  to  be  equivalent.  Tne  method  of  proof  is 

to  show  that  for  all  p  in  V,  E^^  (p)  ^  E  (p)  and  if  the  new  (.ondition  is  satis¬ 

fied,  then  for  all  p  in  V,  E  (p)  ^  E^  (p). 

For  V  in  V  and  1  ^  n 
E^  (v)  =  0 

E  ,  (v)  =  E  (v)  U  i  X  !  3  ,  c ,  (”  -  u,  u.^  and  either  x  in  E  (u^)  or 

n  +  1  '  '  n  1'  12  n  2^^ 

Q  (u_)  and  x  in  E  (u.  )  or  Q  .  (v)  and  x  in  F  (u^))] 
n  2'  n'l  n^l  '  n'l' 

It  can  easily  be  proven  that  there  exists  k  such  that  for  p  in  V  U 
E^  (p)  ^  E^  (p)  ^  (p)  =  E^^^  (p)  =  ... 
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Lema.  For  all  v  in  V,  (v)  ^  E  (v). 

This  is  also  easily  proven. 


Definition.  For  p  in  V  and  1  n,  if*  (p)  =  lx  1  2  two  generation  trees  X  and 
Y  from  p  of  length  ^  n  and  a  word  w  in  such  that  S  (X)  w  =  S  (y)  and  x  is 

the  first  letter  of  w  ] . 


Lema.  Suppose  that  G  =  (V,  o')  is  a  binary  grammar  such  that  (l)  if  C  -* 

x^  find  ^  y]_  Yp  distinct  productions  of  G  then  F  (x^,  x^) 

F  (y^,  72)  =  0  and  either  ~  Q  (x^  x^)  or  ~  Q  (y^  y^)  and  (2)  if  §  -  x  y  then  (x)  n 
F  (y)  =  0.  For  v  in  V-^  and  1  ^  n,  (v)  E^^  (v). 


Proof . 


The  theorem  is  proven  by  induction  in  the  following  manner.  Let  n  be  some 

integer  ^  1.  Assume  that  for  all  v,  if^  (v)  ^  E^^  (v).  Let  p  be  in  'V--^  and 
,n+l 


let  s  be  in  E 


(p)- 


It  must  be  shovn  that  if^^^  (p)  ^  E^  (p),  in  other  words,  that  s  is  in  E^  (p). 


There  exists  a  pair  of  generation  trees  X  and  Y  from  p  such  that  (l)  both  X  and 
Y  have  length  ^  n+1  and  (2)  3  z  (S(X)  z  =  S  (y)  and  s  is  the  first  letter  in  z) . 


Case  1  S(X)  =  € 

Let  p  "*  y^  y2  “the  first  production  of  the  tree  Y. 
from  y^  is  ^  n;  therefore,  s  is  in  F^  (y.j^).  Since 
that  s  is  in  (p)  consequently,  in  E^^  (p) . 


The  length  of  the  subtree 

(p)  is  true,  it  follows 
+  1  ’ 
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Cose  2  S(X)  i  f 

Let  p  "*  X  nnd  p  y  be  the  first  productions  of  the  trees  X  and  Y  respectively, 
with  X  ond  y  in  V^.  Assume  these  productions  are  distinct.  Then  F  (x)  F  (y) 
0.  This  is  impossible  since  S(X)  is  an  initial  subvord  of  S(y).  Let  p 

be  the  first  production  of  both  trees  X  and  Y.  Just  belov  the  root  of  X  are 
two  subtrees  X.,  and  X^  from  v  and  v  respectively.  Let  Y^  and  Y„  be  similar 

1  <1  1  2  12 

subtrees  for  Y. 

Case  2.1  S(X^)  =  S(Y^) 

S(X^)  SCXg) 

-  -  I 

I 

I 

I  s 

-  -  t 

S(Y^)  sCY^) 

8  is  in  e"  (v^);  8  is  in  (vg);  s  is  in  (p);  s  is  in  (p) 

Case  2.2  S(X^)  is  a  proper  initial  subvord  of  S(Y^) 

Case  2.2.1  SCX^)  =  ^ 

s(x^) 

- I 

I 

I 

I  s 

♦  - 

S(Y^)  Sf-'^) 

s  is  in  e”  (v^);  s  is  in  E^  (v^);  (v^)  is  true,  s  '  s  in  E^^^  (p);  s  is  in 

\  (P)- 
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Case  2.2.2  S{X^)  f  € 

S(X^)  S(X2) 

I 

_ !t _ 

S(Y^)  S(Y2) 

Let  t  be  the  first  letter  in  S(X2).  t  is  in  t  is  in  (v^);  t  is  in 

F  (v^);  (v^)  C  F  (v^)  7^  0;  Contradiction 

Case  2.3  S(Y^)  is  a  proper  initial  subword  of  S(X.) 

3(X^)  S(X2) 

I 

_  l_t _ 

S(Yj  S(YJ 

I  C 

Similar  to  case  2.2.2 

Q.E.D. 

Theorem.  If  G  =  (V,  a)  is  a  binary  grammar,  then  G  is  an  fcr  grammar  if 

and  only  if  (l)  if  S  "■  and  ^  y]_  y2  distinct  productions  then 

F  (x^  x^}  3,  F  (y^  y^)  =  0  and  either  ~  Q  'x^  x^)  or  Q  (y^  y^)  and  (2)  if 
S  X  y  then  E^^  (x)  H  F  (y)  =  0. 

Proof 

Part  1.  If  statement  1  then  statement  2. 

\  (x)  £  E  (x);  E  (x)  H  F  (y)  =  0;  E,.  (x)  n  F  (y)  -  0. 
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Part  2.  If  statement  2  then  statement  1. 

For  V  in  V-^  and  1  n,  (v)  ^  (v). 

For  V  in  V-^,  E  (v)  -  (v). 

For  V  in  E  (v)  =  0. 

For  V  in  V,  E  (v)  (v);  F^^  (v)  ^  E  (v);  E  (v)  =  E^  (v). 

Q.E.D. 

CONCLUSION 

Grammars  that  are  used  in  a  top-tn-bottom  syntax  scan  without  backup  are  almost 
always  fcr  grammars.  Such  grammars  have  been  used  in  syntax-directed  compilers 
by  Schorre  [6],  Schmidt  and  Schneider  and  Johnson  [?].  The  only  difficulty 

in  writing  such  a  grammar  for  ALGOL  appears  to  be  that  the  distinction  between 
algebraic  and  Boolean  expressions  is  lost  because  they  both  begin  with  an 
arbitrary  number  of  open  parentheses.  In  other  v^ords,  the  language  of  the 
rewritten  grammar  would  contain  all  the  words  in  ALGOL  and,  in  addition,  words 
that  had  algebraic  expression  where  only  Boolean  expressions  should  be  (for 
example,  ...  IF  A  +  B  THEN  . . . ) . 
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